Marina Sangés Ametllé (s223690)
Our data consists of the reads of the RNA sequencing data that was normalized using FPKM data (Fragments Per Kilobase Million). This normalization allows us to quantify gene expression levels.
The data is composed by 60 samples of breast cells in total: 30 of tumor cells and 30 of normal cells.
Per each one of the samples, the expression of 20246 genes will be assessed.
Final dataset size: 20246 observations of 60 variables.
The main goal of this analysis will be see what are the differencially expressed genes and what biological processes they affect comparing tumor and normal breast cells.
We will consider all the samples, but with an eye on the samples normal_rep14, normal_rep6 and tumor_rep3 since they have expression in less genes than the rest of the samples.
There is a good separation of the two group, but some samples do not follow the pattern. The samples we expected before??
Marina Sangés Ametllé (s223690)

R for Bio Data Science 17-05-2024